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(57) Abstract 



A method of signal processing for time scale and/or pitch modification of audio signals is disclosed. The method involves encoding 
and resynthesising a wave form whereby the wave form is sampled into a series of frames, each frame is multiplied by a windowing 
function where the peak of the windowing function is centred at approximately the zero point of each frame. The resulting function is then 
subjected to a Fast Fourier transform thus producing a frequency-domain wave form. The resultant wave form is convolved with a variable 
kernel function, the specification of the variable kernel function varying with frequency. Maxima and associated minima in a magnitude 
spectrum of each convolved frame are located so that each local maxima and associated minima define a plurality of regions. Each region 
corresponds to a frequency component of the signal. Each of the regions is analysed in the frequency domain representation separately by 
summing the complex frequency components or bins falling within the defined region to a signal vector. The variable kernel function can 
be usefully varied to achieve a differing trade of between the frequency and temporal resolution across the frequency range of the signal. 
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SIGNAL PROCESSING TECHNIQUES FOR TIME-SCALE AND/OR PITCH 
MODIFICATION OF AUDIO SIGNALS 

Field of the Invention 

5 The present invention relates to encoding and manipulation of digital signals. 
More particularly, although not exclusively, the present invention relates to 
time-scale and/or pitch modification of audio signals. As such, the signal 
analysis and re-synthesis method described herein is not limited to audio 
signals. It is envisaged that the present invention may find application in the 
1 0 coding of other signals with the (wavelet-like) method described herein. An 
example of such an application includes image compression. Essentially the 
present invention may be applied where one wishes to simultaneously analyse 
different regions of the frequency domain with differing temporal/spatial 
resolutions 

15 

Background to the Invention 

There are a number of existing techniques for time-scale/pitch modification of 
audio signals which are known in the art. These can be broadly classified as 
20 follows. 

(a) Time domain methods: 

These techniques attempt to estimate the fundamental period of a musical 
signal by detecting periodic activity in the audio signal. By this process, an 
25 input signal is delayed and multiplied by the undelayed signal, the product of 
which is then smoothed in a low pass filter to provide an approximate measure 
of the autocorrelation function. The autocorrelation function is then used to 
detect a nonperiodic signal or a weak periodic signal which might be hidden in 
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the noise. Once the fundamental period of the musical signal is found the 
process is repeated and the analysed sections of the signal are overlapped. A 
significant disadvantage in these techniques is that most audio signals do not 
have a fundamental period. For example polyphonic instruments, recordings 
5 with reverberation and percussion sounds do not have an identifiable 
fundamental period. Further, when applying such methods, transients in the 
music are repeated. This leads to notes having multiple starts and ends. 
Another problem with this technique is that overlapping of the delayed sections 
of the music can produce an audio effect that is metallic, mechanical or 
1 0 exhibits echo-like nature. 



(b) Sinusoidal analysis methods: 

These techniques assume that the input signal is made up from pure sinusoids. 
The inherent disadvantage of such a method is therefore self-evident. 

15 

Sinusoidal analysis techniques use Short Time Fast Fourier Transforms (FFT) 
to estimate the frequency of the component sinusoids. The derived signal is 
then synthesised with a bank of tone generators to produce the desired output. 
Short Time Fourier Analysis captures information about the frequency content 
20 of a signal within a time interval, governed by the Window Function chosen. 
A significant disadvantage of such techniques is that a single time-domain 
window is applied to all the frequency content of the signal, so the signal 
analysis cannot correspond accurately to human perception of the signal 
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content. Also, conventional sinusoidal analysis methods use a local maxima 
search of the magnitude spectrum to determine the frequency of the constituent 
sinusoids including consideration of relative phase changes between analysis 
frames. This technique ignores any side-band information located around each 
5 of the local maxima. The effect of this is to exclude any signal modulation 
occurring within a single analysis frame, resulting in a smearing of the sound 
and almost a complete loss of transients. An example of such a transient, in 
the audio context, is a guitar pluck, 
(c) Phase vocoder methods: 

1 0 This type of technique uses a Fast Fourier Transform as a large bank of filters 
and treats the output of each of the filters separately. The relative phase 
change between two consecutive analyses of the input is used to estimate the 
frequency of the signal content within each bin. A resulting frequency-domain 
signal is synthesised from this information, treating each bin as a separate 

1 5 signal. In contrast to sinusoidal analysis techniques, this method retains the 
spectral energy distribution of the original signal. However, it destroys the 
relative phase of any transient information. Therefore, the resulting sound is 
smeared and echo-like. 

20 In view of the prior art techniques, it would therefore be desirable to analyse 
and process audio signals so that the resultant output retains the tonal 
characteristics of the original signal and is capable of accurately capturing 
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transient sounds without smearing or introducing an echo-like character to the 
output signal. 

Accordingly, it is an object of the present invention to provide a technique for 
processing audio signals which achieves the above mentioned aims, 
ameliorates at least some of the disadvantages inherent in the prior art or at 
least provides the public with a useful choice. Further, it is an object of the 
invention to provide a signal analysis and synthesis method that can also be 
applied to the coding of signals in general. 

Disclosure of the Invention 



In one aspect the invention provides for a method of encoding and re- 
synthesising a waveform, the method including: 

sampling the waveform to obtain a series of discrete samples and 
15 constructing therefrom a series of frames, each frame spanning a 

plurality of samples; 

multiplying each frame with a windowing, preferably raised cosine, 
function wherein the peak of the windowing function is centred 
substantially at a zero point of each frame; 
20 applying a Fast Fourier Transform to each frame thereby producing a 

frequency-domain waveform; 

convoiuting the resultant frequency domain data with a variable kernel 
function, whose specification varies with frequency; 
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locating local maxima and surrounding minima in the magnitude 
spectrum of each convolved frame, wherein each local maxima and 
associated minima define a plurality of regions, each region 
corresponding to a frequency component of the signal; and 
5 analysing each of the regions in the frequency domain representation 

separately by summing the complex frequency components of bins 
falling within the defined region into a signal vector; wherein the 
variable kernel function can be usefully varied to achieve a differing 
tradeoff between frequency and temporal resolution across the 
1 0 frequency range of the signal. 

In a preferred embodiment, the waveform corresponds to a digitised audio 
frequency waveform wherein the kernel function may be varied to approximate 
the perceptual characteristics of the human ear. 

15 

In the case where the waveform corresponds to an audio signal, the location of 
the maxima corresponds to the perceived pitch of the frequency component. 

The method may further include the step of manipulating the signal while 
20 represented as signal vectors. 
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Such manipulation may take the form of modifying pitch or time scale (in an 
audio signal) or further data reduction adapted for efficient signal storage 
and/or transmission. 

5 In the case of modifying an audio signal, the frequency location and phase of 
analysed signal vectors can be shifted as necessary to achieve a scaling of time 
and/or pitch. 

Converting back to the sampled time domain representation of the signal may 
10 be achieved by accumulating into the frequency domain an equivalent signal 
whose components correspond to those signal vectors determined in the 
analysis of the original signal. 

Preferably an Inverse Fast Fourier Transform may be applied so as to give a 
1 5 time domain signal that may be suitably windowed and accumulated to 
produce the decoded signal. 

Preferably the form of the convolution function is determined empirically by 
subjectively assessing the quality of the synthesised output. 

20 

Preferably the application of the kernel function to the frequency domain data 
is implemented as a single-pole low-pass filter operation on said data, the 
pole's location being varied with frequency. 
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Preferably, in the case of the analysis of audio signals, the pole may be 
specified by a control function s(f) of the form: 

s (f) = 0.4 + 0.26 arctan(4 ln(0. 1/) - 1 8) 
5 where /is the frequency in hertz (cycles per second). 

The frequency domain filter may be specified by the relation: 

y oul if) = [i - *(/)k if) + *(f)y- if - ] ) 

1 0 Preferably, for the purposes of manipulating an audio signal, each signal vector 
is treated separately; for pitch shifting the frequency of the component is 
multiplied by a real-valued pitch factor; for both pitch shift and time scale 
modification the necessary phase shift for glitch free reconstruction is 
calculated and applied. 

15 

Preferably the method includes the further steps of: 

zeroing a frequency domain output array, and for each analysed 
frequency component represented as an analysed signal vector; 
mapping the real-valued frequency to the two nearest integer-valued 
20 frequency bins; and 
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distributing the analysed signal vector between the two bins in 
proportion to 1 minus the real-valued frequency and the respective 
bins' locations. 

5 In an alternative aspect, the resulting regions may be translated in frequency, 
so that the location of the maxima is scaled while the surrounding region is 
translated. 

For each region, having a maxima and a first and second associated minima, 
1 0 for pitch shifting of an audio signal, the location of each maxima in the frame 
is scaled by the pitch shift factor, and associated harmonic information 
between the first and second minima is translated to respective positions 
around the scaled maxima. 

15 To time stretch or compress the signal, each maxima is retained in the same 
location in the frequency domain while the band of frequency domain or 
harmonic information associated with the maxima is stretched or compressed, 
thereby stretching the amplitude and frequency modulation of the harmonics 
while preserving the pitch of the input signal. 

20 

The method may further include the further steps of: 

resampling the data in each of the frames into a plurality of bins; 
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mapping each bin to a real valued location in an output frame where for 
a bin x lying within a band with a maximum at a frequency freq max the real 
valued location in the output frequency domain is j>, wherein 



{scale) 

Where shift equals the frequency shift and scale equals time expansion ratio. 



Preferably, y is rounded down to the nearest integer z which is less than or 
equal to y wherein output bins z and z+1 are then added to, in proportion to 1 
minus the difference between y and that bins integer location. 

10 

In a further aspect, the invention provides for software adapted to perform the 
above-mentioned method. 

In a further aspect, the invention provides for hardware adapted to perform the 
1 5 above-mentioned method. 



Brief Description of the Drawings 

The invention will now be described by way of example only and with 
reference to the drawings in which: 



20 
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Figure 1: 



illustrates a simplified schematic block diagram of an 



embodiment of the method of the invention (split over 



pages 28 to 30); 



Figure 2: 



illustrates a simplified schematic block diagram of an 



5 



embodiment of the alternate method of the invention 



(split over pages 31 to 33); 



Figure 3: 



illustrates a schematic diagram of the process of 
searching for the maxima/minima; 



1 0 Figure 5a and 5b: 



illustrates pitch and time stretching in respect of two 
maxima. 



Referring to figure 1, a simplified flowchart illustrates the overall steps in an 
embodiment of the method of signal processing. For clarity, the schematic is 
1 5 split over pages 15 to 17. 

An input audio signal is digitised into frames 10. Each of these frames is then 
processed as follows: 

20 Each frame 10 is windowed (20) with (for example) a wide cosine function 30 
producing time domain modulated representation of the input signal frame 10. 
A Fast Fourier Transform 50 is then applied to the frame producing a 
frequency domain representation of the input signal 60. 

The frequency domain data 60 is then filtered with a filtering function 71 
25 parameterised by s(/). The filtering function may also be viewed as a low-pass 



y- if) = [i - *(/)k (/) + sWy* (/ - 1) 
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single pole filter in the present example. The function s(f) 70 specifies how the 
behaviour of the filter varies with frequency. The filtering function 71 can be 
described by the recursive relation: 

5 Thus s(f) controls the 'severity' of the filter 71. So in effect, a different 
convolution kernel is used for each frequency bin. The real and imaginary 
components of each bin are convolved separately. In the present exemplary 
embodiment, the filtering or convolution function 71 has the effect of 
"blurring" the frequency domain information and therefore the convolving 
1 0 function can be referred to as a blurring function. Blurring or spreading the 
frequency domain data corresponds to a narrowing of the equivalent window 
in the time domain frame. Therefore each frequency bin of the fast Fourier 
Transform is effectively calculated as if a different sized time domain window 
had been applied before the FFT operation. 

15 

The effect of the filter does not have to be to blur the data. For example, 
translating the time domain samples by half the window size would make it 
necessary to high-pass filter the frequency domain data, to achieve the same 
equivalent windowing in the time domain. 

20 

The frequency domain filter 71 is applied to each bin in ascending order and 
then applied in descending order of frequency bin. This is to ensure that no 
phase shift is introduced into the frequency domain data. 
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A key aspect of the present invention is that the control function s(f) is chosen, 
in the case of processing audio frequency data, so as to approximate the 
excitation response of human cilia located on the basilar membrane in the 
5 human ear. In effect, the function s(f) is chosen so as to approximate the 
time/frequency response of the human ear. 

The form of the control function s(f) is, in the present preferred embodiment, 
determined empirically by gauging the quality of the output or synthesised 
10 waveform under varying circumstances. Although this is a subjective 
procedure, repeated and varied evaluations of the quality of the synthesised 
sound have been found to produce a highly satisfactory convolution function. 

A preferred form of the control function s(f) is: 

s(f) = 0.4 + 0.26 arctan(4 ln(0. 1/) - 1 8) 

15 

where /is the frequency in hertz (cycles per second). 

In effect, the aforementioned steps are analogous to an efficient way to process 
a signal through a large bank of filters where the bandwidth of each filter is 
20 individually controllable by the control function s(f). 
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Once the filter 71 is applied, the convolved frequency domain data 80 is 
analysed (90) to determine the locations of local maxima and the associated 
local minima. 

5 To perform this step, it has been found that it is more efficient to use the 
intensity spectrum. Therefore, for each frequency, the data is a local maximum 
if /(/) > I(f - 1) and 1(f) > I(f + I) . Local minima exist if 1(f) < I (J - 1) 

and 1(f) < I (J + 1) . Here, Mag(f) = ^real(f) 2 +im(f) 2 and 
Intensity (f) = real(f) 2 + im(f) 2 . 

10 

Referring to figure 2, each maxima and associated local minima is used to 
define regions (indicated by the shaded arrows in figure 3) which correspond 
to an audible harmonic in the original audio frequency signal. The location of 
the maxima in the frequency domain corresponds to the perceived pitch of the 

1 5 harmonic and the band of the frequency domain information around the 
maxima represents any associated amplitude or frequency modulations of that 
harmonic. Since it is important not to lose this information, a summation of the 
whole band of frequencies around the peak is used to give a signal vector. 
This way the temporal resolution of the analysis sample will match the 

20 bandwidth of any modulations taking place. 
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Each of the regions is processed separately accordingly to the following 
technique. An accurate estimate of the location of each maxima is determined. 
Referring to figure 3, lower graph, the large arrow a (300) is the difference 
between the smallest intensity of the three intensity arrows (max-1) and the 
5 maximum intensity (max). The small arrow b (310) is the difference between 
the smallest (max-1) and the intermediate intensity (max+1). The ratio of the 
two is used to offset the integer maximum value. 



Pitch shifting and time-scale modification are indicated schematically in 
1 0 figure 1 by the numeral 130. At this point alternative applications are indicated 
by data reduction (133) or transmission/storage (134) steps. These are 
illustrated as alternative options in figure 1. 

The manipulated data are re-synthesised according to the following method: 
1 5 For the ith analysed frequency component, vector(i) has a real-valued location 
y in the frequency domain output 

y is rounded down to the nearest integer which is less than or equal to y and 
denoted z. Thus z - Int(y). 



20 The output bins z and z+1 are then added to with vector(i), in proportion to 1 
minus the difference between j> and that bins integer location. 

Bin[z] = Bin[z] + [l-(y-z)] vector(i) 
Bin[z+1] = Bin[z+1] + (y-z) vector(i) 
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where all operations are carried out on complex numbers. 

To modify the time-scale or pitch of the analysed signal, it is necessary to 
compensate for any phase shifts so that the synthesised output is consistent (i.e. 
5 glitch free). To this end, the output signal in any one frame is moved forward 
in time by a fixed number of samples. Therefore, for a given pitch 
measurement it is possible to determine how much the output phase should 
change so that that the output smoothly joins with the previously synthesised 
frame. 

10 

However, the input time frame is moving by some other number of samples. 
Therefore, the analysed phase values are already changing as the analysis 
window moves through the input data. 

1 5 Therefore the difference between the rate of change of input phase and the 
required rate of change of output phase is calculated. The difference between 
these phases is a measure of how fast to rotate the phase of the frequency 
domain data between analysis and synthesis. Each of the signal vectors 
defined above has a frequency measurement. This measurement is used to 

20 calculate how quickly to spin a vector of magnitude 1, where the vector is a 
complex number of representation. This vector is multiplied by the signal 
vector to provide the necessary phase shift for synthesis without affecting the 
timing of the decay characteristics or other modulations for each region. 
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This phase shift (in radians) is given by: 

phase(i) 

Where t r = reconstruction time step in samples, t a = analysis time step in 

samples and t w = FFT size in samples. 



Since the measurement of frequency provides a measure of phase difference 
between one synthesis frame and the next, these differences must be summed 
cumulatively as synthesis proceeds. 

The cumulative sum applies only to one region, therefore regions must be 
tracked from one synthesis frame to the next. 

A convenient data structure has been developed to track regions from one 
frame to the next and is described with reference to figure 4a and 4b. One 
integer array contains the location of the local maximum within a region for all 
the bins in that region. A corresponding array contains the last phase value (in 
radians) used to rotate that regions phase. The phase value is stored in the bin 
with the same index as the location of the maximum. 

Therefore, when a new frame is analysed and local maxima detected, the 
location of the maximum is used to index into the integer array. This provides 
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the index of the maximum that existed in the previous frame. This index is 
then used to access the array containing the last phase value used for the 
corresponding region in the previous synthesis frame. This is illustrated in 
figures 3a and b whereby an analysis frame n is illustrated along with the 
5 nearest maxima array and the phase array. Considering the n+1 analysis 
frame, the first frequency maxima is 7. The corresponding seventh element of 
the nearest maxima array from the previous frame is 5. The fifth element of 
the phase array frame from the previous frame n is 12 degrees. This is 
updated using an estimate of the local maxima and then stored in the phase 
10 array for the next frame using position 7. For the second region 410 the 
thirteenth element of the nearest maxima array from the previous analysis 
frame n gives 16. From the phase array of the previous analysis frame n the 
phase is given as 57 degrees. A frequency estimate is used to update this phase 
value and is placed in the position 13 of the next phase array. 

15 

A frequency domain representation of the signal is constructed from the known 
signal components. For each signal vector, that vector is added to the 
frequency domain output array. Since the frequency locations are real valued, 
the energy from a signal vector is distributed between the nearest two (integer 
20 valued) bin locations. The frequency domain representation is then inverse 
Fourier transformed (150 in figure 1 page 16) to provide a time domain 
representation of the synthesised signal. Since the signal was analysed with 
differing temporal resolutions at different frequencies, the synthesised time 



WO 00/13172 



PCT/NZ99/00143 



18 

domain signal is only valid in the region equivalent to the highest temporal 
analysis resolution used. To this end, the synthesised time domain signal is 
windowed (160) with a (relatively) small positive cosine window (170), before 
being added (172) in an overlapping fashion to the final synthesised signal 
5 (180). 

A variation, although equivalent, method of manipulating the information to 
achieve pitch shifting and time stretching is as follows. 

10 The alternate method is substantially similar to the first method, sharing 
identically the steps of windowing (420), Fourier transforming (450), filtering 
(460), minima and maxima detection (490). The major difference between the 
two methods is after this point. Whereas the first method sums the contents of 
each region into a signal vector (110), the alternate method instead explicitly 

1 5 retains the contents of each region (510). The contents of each region are then 
translated and scaled in accordance with the pitch shift and time stretch factors 
respectively (530). For a pitch shift operation, the contents of a region are 
translated such that the maximum is scaled in frequency. For a time stretch 
operation, the contents of a region are scaled by the time stetch factor, but so 

20 that the maximum does not change in frequency. 

Phase shift compensation is carried out substantially as described above with 
reference to figure 4a and 4b. To synthesise the output, the frequency domain 
data to be synthesised is copied a region at a time from the unaltered output of 
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Fourier transform step. The contents of each region are accumulated into the 
output frequency domain buffer in the same fashion as the first method. 

There exist variations in the implementation of these two techniques that will 
5 be clear to one skilled in the art. However, the key feature of the present 
invention resides in using a control function s(f) to vary a frequency domain 
filter at different frequencies. This brings about a windowing effect on the 
equivalent time-domain data that varies with frequency. In the case of 
processing audio frequency signals, this control function is chosen to reflect 
1 0 the response of the human cilia to a range of audio frequencies. Although the 
shape of this curve is determined empirically, it is possible that other curves 
may prove suitable for other manipulative techniques and applications. 

A further feature of the present invention resides in the identification and 
15 location of the maxima and associated minima. The presently disclosed 
technique is computationally highly efficient and allows rapid high quality 
time stretching and pitch shifting of audio signals. 

Experimentally, it has been shown that the present technique produces a sound 
20 with significantly enhanced tonal qualities and it is believed that this is largely 
achieved through the preservation of the harmonic information in the side- 
bands of the local frequency maxima. 
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In terms of a practical implementation of the present invention, it is envisaged 
that the technique may be implemented in software or alternatively in 
hardware. In the latter case, the hardware may form part of an audio 
component such as an audio player. Potential applications of the invention 
5 include the sound recording industry where audio signal processing/synthesis 
is commonly required to meet very high standards of reproduction quality. 
Alternative applications include those in the entertainment industry and it is 
anticipated that the technique may find application in sound 
reproduction/transmission systems where variations in pitch or tempo may be 
1 0 desirable. It is further anticipated that applications may exist in general signal 
processing, data reduction and/or data transmission and storage. In the latter 
case, the selection of the particular convolution function may vary. 

Where in the foregoing description reference has been made to elements or 
1 5 integers having known equivalents, then such equivalents are included as if 
they were individually set forth. 

Although the invention has been described by way of example and with 
reference to particular embodiments, it is to be understood that modifications 
20 and/or improvements may be made without departing from the scope of the 
appended claims. 
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CLAIMS: 

1. A method of encoding and re-synthesising a waveform, the 
method including the steps of: 
5 sampling the waveform to obtain a series of discrete 

samples and constructing therefrom a series of frames, 
each frame spanning a plurality of samples; 
multiplying each frame with a windowing function 
wherein the peak of the windowing function is centred 

1 0 substantially at a zero point of each frame; 

applying a Fast Fourier Transform to each frame thereby 
producing a frequency-domain waveform; 
convoluting the resultant frequency domain data with a 
variable kernel function, the specification of the variable 

1 5 kernel function varying with frequency; 

locating local maxima and surrounding minima in the 
magnitude spectrum of each convolved frame, wherein 
each local maxima and associated minima define a 
plurality of regions, each region corresponding to a 

20 frequency component of the signal; and 

analysing each of the regions in the frequency domain 
representation separately by summing the complex 
frequency components or bins falling within the defined 
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region into a signal vector; wherein the variable kernel 
function can be usefully varied to achieve a differing 
tradeoff between frequency and temporal resolution across 
the frequency range of the signal. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the windowing function is a raised 
cosine function. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the waveform corresponds to a 
digitised audio frequency waveform wherein the kernel function 
is varied to approximate the perceptual characteristics of the 
human ear. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the waveform corresponds to an 
audio signal, and the location of the maxima corresponds to the 
perceived pitch of the frequency component. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 further including the step of manipulating the 
signal while represented as signal vectors. 
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A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein said manipulation takes the form of 
modifying pitch or time scale (in an audio signal) or further data 
reduction adapted for efficient signal storage and/or 
transmission. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein, in the case of modifying an audio 
signal, the frequency location and phase of analysed signal 
vectors are shifted according to a predetermined amount to 
achieve a scaling of time and/or pitch. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein converting back to the sampled time 
domain representation of the signal is achieved by accumulating 
into the frequency domain an equivalent signal whose 
components correspond to those signal vectors determined in 
the analysis of the original signal. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein an Inverse Fast Fourier Transform is 



PCT/NZ99/00143 



24 

applied so as to give a time domain signal that may be suitably 
windowed and accumulated to produce the decoded signal. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the form of the convolution function 
is determined empirically by subjectively assessing the quality 
of the synthesised output 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the application of the kernel 
function to the frequency domain data is implemented as a 
single-pole low-pass filter operation on said data, the pole's 
location being varied with frequency. 



A method of encoding and re-synthesising a waveform as 
claimed in claim 1 1 wherein, in the case of the analysis of audio 
signals, the pole is specified by a control function s(f) of the 
form: 

s(f) = OA + 0.26arctan(41n(0.1/) - 18) 
where /is the frequency in hertz (cycles per second). 
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A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the frequency domain filter may be 
specified by the relation : 

y 0 u, CO = [i - * (/>k if) + *(f)y~ if - 1) 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein, for the purposes of manipulating an 
audio signal, each signal vector is treated separately; for pitch 
shifting the frequency of the component is multiplied by a real- 
valued pitch factor; for both pitch shift and time scale 
modification the necessary phase shift for glitch free 
reconstruction is calculated and applied. 

A method of encoding and re-synthesising a waveform as 
claimed in claim 1 wherein the method includes the further steps 
of: 

zeroing a frequency domain output array, and for each 
analysed frequency component represented as an analysed 
signal vector; 

mapping the real-valued frequency to the two nearest 
integer-valued frequency bins; and 
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distributing the analysed signal vector between the two 
bins in proportion to 1 minus the real-valued frequency 
and the respective bins' locations. 



5 16. A method of encoding and resynthesizing a waveform as 

claimed in claim 1 wherein the resulting regions in the 
frequency domain are translated around each maxima to a 
different frequency, the position of the maxima and the resulting 
signal being a multiple of the frequency of the maxima so that 
10 the location of the maxima is scaled while the surrounding 

region is translated. 



17. A method of encoding and resynthesizing a waveform as 
claimed in claim 16 wherein for each region, having a maxima 

1 5 and a first and second associated minima, for pitch shifting of an 

audio signal, the location of each maxima in the frame is scaled 
and associated harmonic information between the first and 
second minima and maxima is translated to respective positions 
around the maxima. 

20 

18. A method of encoding and resynthesizing a waveform as 
claimed in claim 16 or 17 wherein to time stretch the signal, 
each maxima is retained in the same location in the frequency 
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domain while the band of frequency domain or harmonic 
information associated with the maxima is compressed, thereby 
stretching the amplitude and frequency modulation of the 
harmonics while preserving the pitch of the input signal. 

5 

19. A method of encoding and resynthesizing a waveform as 
claimed in claim including the further steps of: 

resampling the data in each of the frames into a plurality 
of bins; 

1 0 mapping each bin to a real valued location in an output 

frame where for a bin x lying within a band with a 
maximum at a frequency freq max ihe real valued location in 
the output frequency domain is>>, wherein 



15 

„ (x- freq m 



20 



where shift equals the frequency shift and scale equals 
time expansion ratio. 
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20. A method of encoding and resynthesizing a waveform as 
claimed in claim 19 wherein, j> is rounded down to the nearest integer 
z which is less than or equal to y wherein output bins z and z+l are 
then added to, in proportion to 1 minus the difference between y and 

5 that bins integer location. 

21. A software application operating in accordance with the method 
of claims 1 to 20. 

10 22. A device constructed to perform in accordance with the method 

of claims 1 to 20. 
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